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Abstract: Dynamic factor models have a wide range of applications in econometrics 
and applied economics. The basic motivation resides in their capability of reducing a 
large set of time series to only few indicators (factors). If the number of time series is 
large compared to the available number of observations then most information may 
be conveyed to the factors. This way low dimension models may be estimated for 
explaining and forecasting one or more time series of interest. It is desirable that outlier 
free time series be available for estimation. In practice, outlying observations are likely 
to arise at unknown dates due, for instance, to external unusual events or gross data 
entry errors. Several methods for outlier detection in time series are available. Most 
methods, however, apply to univariate time series while even methods designed for 
handling the multivariate framework do not include dynamic factor models explicitly. 
A method for discovering outliers occurrences in a dynamic factor model is introduced 
that is based on linear transforms of the observed data. Some strategies to separate 
outhers that add to the model and outliers within the common component are discussed. 
Applications to simulated and real data sets are presented to check the effectiveness of 
the proposed method. 
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1. Introduction 



Dynamic factor models have been introduced to explain and forecast time series of 
interest in the presence of a large set of explanatory time series. In practice, useful- 
ness of dynamic factor models is apparent when the dimension is so large that vector 
autoregressive models are not able to handle the multivariate time series efficiently. 
Reduction of the available time series to few factors allows efficient and interpretable 
models to be estimated. Factor extraction has to be accomplished in such a way that 
only negligible or little amount of information be lost. 

The study of the eigenvalues and eigenvectors of the parameter matrices was early 
suggested by (fl8|) to produce a simplified version of an autoregressive model. A canon- 
ical transformation of a vector autoregressive model based on the simultaneous rela- 
tionships between variables was introduced by (2) . The relationships between different 
time lags were considered by dianddil) in the frequency domain. The principal com- 
ponent analysis was extended in the frequency domain by (3j). Identification of the 
number of factors in multivariate time series process was addressed by (14). 

Factor models are strictly related to the diffusion indexes methodology (for instance 
|20|) . As pointed out by (6), when the dimension is large vector autoregressive (VAR) 
and vector autoregressive moving average (VARMA) models are difficult to estimate 
because the number of parameters grows with the number of time series quadratically. 
On the contrary, for dynamic factor models the growth is linear. 

Usually mutually uncorrected factors are assumed, whilst individual factor time 
series may be autocorrelated. The multivariate dynamic structure of the observed A'- 
component time series y, may be modeled through the matrix factor A, that is 

y, =Ax, + T7( 

where Xt is a vector of K independent time series and Tj, is the idiosyncratic disturbance. 
Each factor time series may follow a linear model, that is 
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where B is the back-shift operator and ei.t is uncorrelated white noise. This leads to the 
dynamic factor model 

y,=Ae{B)£, + rit 
where 0{B) = diag {9i (B),. . . , 9k{B)). This model is a special case of 

y, = v^(B)e, + Tj, 

p-t 

as considered in (5), for instance. The equality 

V/,-,;(B)=fl,j0,(B) 

reduces to the assumption that the impact of any shock e,- , on the observed time series 
yi,t decays over time in similar way for any This assumption may also be justified on 
the ground of the asymptotic results by (la), p. 456. 

Outliers in time series were introduced by ('7) according to two different models, 
the additive outlier (or aberrant observation) and the innovation outlier (or aberrant 
innovation). This latter impacts the observed time series for some time span after the 
occurrence date, the former affects only one observation at the date of its occurrence. 
In spite of this, the additive outlier has serious effects on parameter estimates and fore- 
casts, while the effects of the innovation outlier is often less serious. This motivates our 
choice for modeling outliers in the dynamic factor model as outlying observations of 
the additive type. 

The plan of the paper is as follows. In Section |2] we introduce the outlier structure 
that we assume to be possibly present in a dynamic factor model. This structure and 
its implications will be examined in detail in Section [3] A method for checking the 
adequacy of the dynamic factor model to fit the data will be illustrated in Section |4l 
Methods for estimating the dynamic factor model parameters are discussed in Section 
|5]and the impact of outliers on the estimates will be examined in Section|6] In Section 
|7] a procedure for identifying outliers and estimating their size is presented and illus- 
trated by an example. A simulation experiment for checking the effectiveness of the 
procedure in comparison with a multivariate model-based method (23.) and a projec- 
tion pursuit-based procedure (HI) will be reported in Section |8l Our procedure is then 
applied to a set of real data, that is some quarterly economic data on asset prices, ac- 
tivity, wages, goods and commodity prices from the seven-country data set studied by 



(l22h . Results are reported in Section |9] Section [TOl concludes. Proof of theorems are 



found in Appendix[ 



2. The dynamic factor model with outliers 

Let yt be an observed A^-component vector time series and the temporal index t = 
1 , . . . , r. We may even assume that the number of the time series components is 
greater than the number T of dates when observations were made. We assume further 
that though may be very large the observed time series is actually explained by a 
much smaller number of unobservable time series — {xit,... ^XKt)' and an idiosyn- 
cratic A^-dimension disturbance Tj,. Then the dynamic factor model with outliers may 
be written 

=Ajc, + (oA, + 77,, (2.1) 
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where A \5 axv N y. K matrix of rank K, K <^ N. The outUers occurrence dates are 
modeled by the binary series {A,} and by the x 1 vector co which represents the 
outlier size. 

Let us make about model (12.1b the following assumption: 

1- {xir}, {x2t}, ■ ■ ■, {xKt} are mutually independent standardized random processes, 
i.e. E{xii) = 0, E(jc|) = 1 for any ; and /, and E(x,iX/s) =Oiii^ j. 

2. The dynamics of the unobserved factor time series x, may be modeled as 

where = 1, IJ^oC^;?^ < i ^l, ■ ■ ■ ,K, and e, = {ei,, . . .^Ek,} are Gaus- 
sian white noises mutually independent at all leads and lags with diagonal variance- 
covariance matrix Eg . 

3. {A^} is a deterministic scalar binary time series and o is a non random x 1 
vector For an outlier occurring at time fo, A( = 1 if f = fo and A, = otherwise. 

4. T], = {t7i,, . . . , r\Nt} are Gaussian stationary time series both serially and mutually 
independent at all leads and lags with diagonal variance-covariance matrix E,) . 

5. The vector time series £( and r]t are mutually independent at all leads and lags. 

These assumptions are motivated by the idea that the dependence among the ob- 
served time series components is entirely explained by the factors. Therefore the id- 
iosyncratic terms are also independent, since otherwise they would contribute to ex- 
plain correlations between two observed components and should be put into the factor 
vector 

In general model (12.11 ) is not identified unless some assumptions are made about 
either the matrix A or the vector time series Xt . In fact, let C be any non-singular K x K 
matrix. Then in model ( 12.1b we have 

.^-f ^ — AC . 

By letting A* = AC^^ and x* = Cxt model ( 12.1b could be written as well 

yt ^A*x; + coA, + ri,. 

No restriction is made on the matrix A except that its rank is equal to K. Notice that 
Assumption 1 does not imply any loss of generality. In fact if rT(0) = cov(x,,ji:J) is 
not the identity matrix Ik we could replace Xt with the transformed data rx{Q)^'^Xi. 
As r;f(0) is positive definite then a factorization Tx{0) = Tx{0)^ {Tx{0)^)' exists for a 
non-singular matrix rv(0)2. The variance-covariance matrix of the transformed data 
turns out to be the identity matrix. 

We prove the following theorem in Appendix lA.il 

Theorem 2.1. Model ( 12.71 ) under Assumptions 1 and 2 is identified up to factor sign 
changes. 

It has to be noticed that model ( 12.11 ) is uniquely determined by Assumptions 1 — 5 
up to a permutation matrix and changing of sign. In fact, the order of the factors may 
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be taken arbitrarily without affecting the model's structure. Moreover, Assumption 1 
determines the factors sizes but each factor may be multiplied by ±1 without affecting 
its variance. 

We may write the relationships that link the variance-covariance matrices of the 
observed data with those of the factors and of the idiosyncratic component 

Ty{0) = Ar,(0)A' + E,, = A4' + , 

and 

Yy{h)=AY^{h)A', h^O. 

Let 7ij{h), i,j = 1, . . . ,A^, denote the entry in row / and column j of the matrix ry{h), 
and 7f (/z), / = l,...,K, denote the diagonal elements of the matrix rx{h). 

Note that Assumption 1 is used that implies rv(0) = / in the first equality, while the 
second equality shows that the matrices {F, (/!)}'s are symmetric because the {rv(/i)}'s 
are diagonal. 

It is sometimes assumed that the columns of A are orthogonal. This ensures the ad- 
vantage that those columns are eigenvectors of all the co variance matrices of {y, } at any 
lag (see, e.g., 14). However, we feel that such assumption, together with Assumption 
1, is somewhat unrealistic and it will not be formulated here. 

3. Outliers in factor models 

The estimation of outliers in Equation ( 12. Il l is greatly simplified if a linear transform 
of the data exists that may highlight the impact of outlying observations. If parameters 
in Equation (12.1b are assumed known, then, by taking the projection matrix Z = / — 
A{A'A)^^A', the following lemma is easily proved. 

Lemma 1. Let the N x K matrix A be defined as in Equation (|2]7). Then aN xN matrix 
Z exists such that ZA = f is the N x K zero matrix). 

Lemma [T] has interesting implications concerned with the outliers estimation in 
model (12.1b . The matrix Z projects the vectors of into the space orthogonal to the 
space Va spanned by the columns of A. Let denote this orthogonal space. By letting 
V be the space of the vectors in M.^ we have V =Va® ■ Any vector in V may be 
written as the sum of a vector in Va and a vector in V^- Then three cases may occur 

1 . Zo) = 0. In this case o e Va, that is there exist coefficients ci , C2, . . . , c/f such that 

(O = a\Ci +a2C2 + . . . +aKCK, 

where ai,a2, ■ ■ ■ ,flA: denote the columns of A. We may write CO — Ac, where 
c = (ci,C2, . . . tCk)' ■ Then Equation ( 12.1b becomes 

y, ^A{x,+c^,) + r^,. 

The outliers are entirely within the factors, that is the observed yt are affected by 
outliers only through the factors. 
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2. ZfO^Q and A(A'A)"'A'(a = 0. This means that CO e V/. The outUers impact the 
observed yt but the factors are actually outlier free. 

3. Zft) ^ and A (A 'A)^'A'a) ^ 0. Then CO maybe written as the linear combination 
of a basis in V obtained by assuming the columns of A as a basis in Va and a 
basis M = [mi ,OT2, . . . ,mN-K] in V^- We have 

K N-K 

CO = Y_aiCi+ mjfij ^Ac + Mfi =Ac+ ^ (3.1) 

i=l ,/=l 

for some coefficients vectors c and jj.. Model ( 12.1b becomes 

y, =A{xt + cA,) + CA, + r^,. 

The observed time series is affected by an outlier of size ^ that adds to the 
whole structure and an additive outlier c, in each factor Xr.i . 

Cases (1) and (3) may be treated by estimating x+ = x, + cA, as if it were actually 
the model factors. Univariate search may be performed on the estimated factors to 
discover outliers dates and estimating their sizes. 

We underline that in case (1), when Zco = 0, the dynamic model pattern is not af- 
fected by any perturbation. This latter is only transmitted by the model from factors to 
observed data. In that case the projection method we propose here is unable to iden- 
tify the outliers, and they can only be discovered estimating the factors and employing 
univariate outlier search. 

In cases (2) and (3) detection and estimation of outliers that impact the observed 
directly may be performed on the transformed model 

Zyr^ZCA,+Z7]r 

where <E V^- Note that in case (2) we have CO — while in case (3) Equation (13. Il l 
holds so that co^ ^.In case (3) the outlier size co has to be estimated partly in dynamic 
factor model and partly in the factors. 

A similar development applies if the following dynamic model, as proposed by 
is assumed: 

s 

yr = L Vu£t-u + ri, + coAt (3.2) 

M=l 

where the y/„'s are N x K matrices, e, is a /iT-dimensional completely white noise with 
variances equal to 1, sK < N and Assumptions 3, 4, 5 above hold. Let V^, denote the 
space spanned by the columns of the matrices {i/u, m — l,...,s} and = Vy/ V^, 
and Z the projection matrix onto V^. We have 

Zy, = Zri, + {Zco)A, . 

In this case also co may be written (but not necessarily in an unique way) as 



CO = Yici + Y2C2 + ■■■ + Ys^s + C 



R.Baragona and F. BattagUa/OutUers in dynamic factor models 



398 



where e V^, and the model may be expressed as follows: 

.V 

yt = Y^Wu {£t-u + c„A,) + ^, + CA, 

which decomposes the effect of an outlier into two parts, one that perturbs the dynamic 
structure of the model by altering the effect of the past values of the factors (c„) and 
the other one simply superimposed to the observation (Q. 

Usually the matrix A is unknown and we may apply the preceding procedure only by 
computing an estimate A. The presence of the outlying observations themselves makes 
the estimation difficult and often unreliable. Under some additional assumptions the 
following theorems allow an alternative procedure to be entertained which does not 
require estimating A. In what follows, all eigenvectors are normalized, i.e. they are 
taken with modulus equal to one. 

Theorem 3.1. Let yt satisfy model f l2.il ) with Assumptions 1 — 5 and suppose that for 
each j = 1,2, ... jA" there exists a lag hj ^ such that Yji^j) 0- Then z'A = if and 
only ifz is eigenvector associated with a zero eigenvalue of each Ty(h),h ^ 0. 

Proof is in Appendix lA.2l Note that the assumptions of Theorem l3 . 1 l are not satisfied 
if one of the factors is white noise. 

Theorem 3.2. Letyt satisfy model ( 13.21 ) and Assumptions 3 — 5 and suppose in addition 
that 

(i) rank(y/i) = K 

(ii) There exists a lag k, 2 < k < s such that rank(y/<.) = K 

then z' = 0, M = 1 , 2, . . . , i if and only ifz is eigenvector associated with a zero eigen- 
value of each Yy{h),h — 1,2, ... ,i — 1. 

Proof is in Appendix lA.3l 

We note that Theorem 13.11 does not hold in general for h ~ since in that case 
Ty (0) = AA' + Ejj . Nevertheless, if the idiosyncratic disturbances are homoscedastic 
then the following theorems hold. 

Theorem 3.3. Let z be any eigenvector associated to the smallest eigenvalue ofTy(0). 
Let us assume, additionally, that E,j = a^L Then z G V^, that is z'A = 0. The converse 
is also true. 

Proof is in Appendix lA.4l 

A similar result holds for the dynamic model ( I3.2l l. 

Theorem 3.4. Let yt satisfy model ( 13.21 ) and suppose that 'L-q = C^/. Then if z!\fu = 
0,M = 1, ... ,5, z is eigenvector associated to the smallest eigenvalue ofTyiQ), and the 
converse is also true. 

Proof is in Appendix lA.51 

The preceding theorems suggest a procedure to compute a projection of the multi- 
variate time series that allows potential outliers to be readily detected. 

If the hypothesis of homoscedasticity is assumed, we may compute an estimate 
TyiO) (possibly a robust estimate) of the variance-covariance matrix ^ (0). Then con- 
sider the eigenvectors associated to the smallest eigenvalue of ^^(O) (the smallest 
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eigenvalue may have multiplicity greater than one). Let z be any such eigenvector, 
then, according to Theorem l3 . 3 1 and [34l for the univariate time series z!yt we have 

^y,~^r]t + {7!(o)^,. 

Any such projection of the multivariate time series may be analyzed by means of uni- 
variate methods to detect potential outlying observations. Then evaluate the outlier's 
size by assuming the dates of occurrence of outliers from univariate analysis and using 
estimation methods in the multivariate framework. 

If the homoscedastic hypothesis may not be assumed, the same result is obtained 
using Theorem 13.11 and 13.21 by taking z equal to the eigenvector associated to a zero 
eigenvalue of a Tyih) for h>0. 

4. Factor model adequacy 

A crucial point is whether the simple factor model (12.1b together with our Assumptions 
fits the data adequately. Increasing the number of factors K does not solve the problem 
because not all processes may be represented by equation (12. It for arbitrary K and 
under Assumptions 1 — 5, since their autocovariance matrices have to be symmetric 
as seen before. This suggests that a measure of adequacy of the factor model might be 
obtained by evaluating the differences between the elements and (j,/) ofYy{h),or 
the autocorrelation matrix. Let rij{h) = Yij{h){fj^{Q)'f'j{Q)}^^l^, and denote by r,j(/z) 
the corresponding estimate. If Ty{h) is symmetric, using classical results (see, e.g., [13) 
we obtain that the difference r,;(/j) — fji{h) is asymptotically normal with mean zero 
and variance 

var{r,-,(/z)-ry,(/i)} 

2 °° 

- ^ Vni'^ViM) ^ ri.Mf ~ ni{u)rjj{u - 2h) + rij{u)rij{u ~ 2h)] 

and it depends both on the cross-correlation and autocorrelation functions in a compli- 
cated way; furthermore, such differences are correlated for different indexes / and j. 
Therefore the differences r,y (/i) — fji{h) cannot be used in any plausible way to test the 
hypothesis that Ty{h) is symmetric. However, a possible solution is found turning to the 
frequency domain, in an analogue way as proposed by (9) when estimating parameters 
of factor models. 

In the frequency domain the symmetry of the covariance matrices Ty{h) for any h is 
equivalent to a real spectral density matrix for any frequency. Let 



F{1) = 




denote the spectral density matrix of . We prove the following theorem in Appendix 
1X61 

Theorem 4.1. IfTy(h) is symmetric for any h, then F(X) is real for any A and vice 
versa. 
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We shall therefore test the hypothesis that F{X) is real for any X. 
Let us define the Fourier transforms as the x 1 complex vectors 



From Theorem 4.4.1 of (3) it follows that the real 2A^ x 1 vector [RedriX)' ,lmdT{XY]' 
converges as T ^ oo to a normal random vector with mean zero and variance covariance 
matrix: 

1 r ReF(A) -Imf (A) 

2 [ lmF{X) ReF{X) 

If the spectral matrix is real, lmF{X) =0, thus RedriX) and ImdriX) are (as T ^ oo) 
independently identically distributed normal vectors with zero means and variance co- 
variance matrix jF{X). Therefore the hypothesis of real spectral density is equivalent 
to the independence of two normal vectors and may be tested by means of likelihood 
ratio. However, only one observation would be available for each fixed A. To overcome 
such difficulty, and to test reaUty for any A, we use a device similar to 

Let Xj = 2nj/T denote the Fourier frequencies, and suppose that T is sufficiently 
large so that for a set of frequencies {Xj^a < j < b} we can assume F{Xj) ~ F. Also. 
letJ = b — a and 



Red{Xa 



X 



lmd{Xa+j) , j = l,...,J 



where we have dropped for convenience the dependence on T. For T large we may 
assume that 

' ■ \ 1 / ReF -ImF 
/ ' 2 I ImF ReF 



'N 



while under the null hypothesis Hq: F real, X^ and Xj are independently identically 

distributed normal vectors with zero means and variance covariance matrix jF. Define 
the variance estimates: 




j:xf(xfr+x^(xjr 



^sj=j[T^iixf)'-xf(x^r^ 



The following theorem provides the likelihood ratio test. 

Theorem 4.2. The likelihood ratio test statistic for the null hypothesis Hq: F real is 
given by 

and its distribution under Hq is equal to that of the statistic Um.n,J-N-1 of (dt chap. 9). 

Proof is in Appendix |A.7| Some approximations are discussed in (Gl) , which for our 
statistic imply approximating —m\ogU (m = J — N — |) by a chi-square variable with 
de grees of freedom. The test of Theorem l4.2l mav be employed repeatedly on non- 



overlapping frequency intervals, and the usual caveats for multiple testing apply (see, 
e. g.,[Tll chap. 5.4). 
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5. Estimation problems 

Outliers identification is only concerned with the detection of time points where outly- 
ing observations occur When this task is performed by examining a univariate projec- 
tion series, as is suggested in this paper, little may be said about the multivariate out- 
liers size. The model parameters matrices, either A in the model's formulation ( 12.1b or 
{i/Ai , 1//2, . . . } in the model's formulation ( 13.2b . have to be estimated from available data 
{yt,t = 1, . . . , r}, along with the common components and the variance covariance ma- 
trix E,] of the idiosyncratic component. This way outliers size may be estimated while 
the estimated model is available for studying the relations among the observed time 
series or for forecasting purpose. 

We shall distinguish in estimation procedures whether the idiosyncratic covariance 
matrix is constrained to the relationship = a^I, or to be a diagonal matrix, or no 
special constraints are imposed on its entries. Also, we consider here the unperturbed 
case of absence of outliers. The distortion induced by the presence of an outlier will be 
considered in the next section. 

The log-likelihood of {yi,y2i ■ ■ ■ ijr} according to model ( 12.1b under the assumption 
E,, = a^I is 

L = - ^ log 27r - A^r log _ _1_ £ (3,, _ Ax, )' [yt -Ax,). 
Let us define the N xT matrix 

Y = \yuy2,---,yT], 

where the y/s are x 1 arrays, and the K xT matrix 

[xi,X2,...,xr], 

where the x,'s are K x I arrays. Then the sum of squares in the log-likelihood may be 
written 

ti{{Y-AX)'{Y-AX)}, 

and its minimization is equivalent to maximizing the likelihood. 

Model (12.1b is considered by (20) and {21) who assume normality, E,j ~ a^I, and 
treat the factors {x,} as deterministic components. They show that, on maximizing the 
likelihood with respect to both {x,} and A, the maximum likelihood estimate of A is 
given by the matrix formed by the K eigenvectors associated to the K largest eigenval- 
ues of rv(0). They assume A' A = /; if we want to dispense with such hypothesis, we 
may use instead the fact that rv(0) — I (actually XX' = / is assumed for simplicity). 
This leads to the following different estimate. 

Theorem 5.1. Let {y,} satisfy model f l2.7l ) with Assumption 4 with E,j = G^I, and 
suppose that {x,} are constants andXX' = I. Let r be the rank ofY and K be a known 
pre-specified integer, so that < K < r < min(A^, T), and /ef Ai > A2 > . . . > be the 
eigenvalues ofY'Y with associated orthogonal eigenvectors ui,U2, ■ ■ ■ ,uj in R^. Then 
the problem 

min{tr((y-AX)(y-AX)')} subject to XX' = Ik 
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is solved by 

X = [u\,U2,--- ,uk] 

and 

A = YX'. 

Further, the formula for A reduces to 

A=WkAI/\ 

where Wk = [wi, . . . ,wk] (wi, W2, ■ ■ ■, wk are the eigenvectors associated to the K 
largest eigenvalues ofYY'), and t^J^ — diag(-\/Xi", . . . , \/^). 

Proof is in Appendix IA.8I The estimate of the matrix A given by Theorem 15.11 is 
consistent as the estimate YY' /T is known to be consistent and its eigenvalues and 
eigenvectors (which are continuous functions of the elements of the matrix YY' /T) are 
consistent as well. If the observed data are not standardized, and if the eigenvalues are 
all distinct and the true variance covariance matrix is definite positive, then it may be 
shown that the eigenvalues are asymptotically independently normally distributed. The 
difference between estimated eigenvalues and actual ones is of order 1 / VT in probabil- 
ity. The estimates of the eigenvectors are asymptotically normally distributed but they 
are not independent (see, e.g., p. 290). Rate of convergence to actual eigenvectors 
is of order 1 / \/T in probability. 

If we assume that the factors {jc,} are random processes, the method of linear factor 
analysis may be employed. To overcome the problem that the factors are autocorrelated, 
i9) has introduced a frequency domain extension of the factor analysis which may 
be directly applied to model (12.1b assuming that Y.-^ is diagonal but not necessarily 
homoscedastic. 

An alternative estimation method is using a Kalman filter in a state space formu- 
lation of the model, where ( 12.11 ) is considered as a measurement equation and {x,} is 
the state vector. In that case, a transition equation has to be specified for the factors Xf 
which may be convenient if we assume that the process {x,} is easily modeled in state 
space form (if, for example, it is assumed a low order autoregression). 

Finally, an estimation method which does not rely on any assumption on E,j may 
be obtained using a technique of temporal blind source separation, for instance the 
temporal decorrelation source separation method (25) which uses an algorithm for ap- 
proximate simultaneous diagonalization of several covariance matrices. Under model 
(12.1b we have 

Ty{h)=AY^{h)A' h^O, 
and, taking the generalized inverse B = (A'A)^^A', 

rv(/z) = Br,.(/z)B'. 

Since rx{h) is diagonal for any h ^ 0, we may determine B in such a way that the 
off diagonal elements of BYy{h)B' are as small as possible. Formally, the matrix B is 
obtained by the following approximation problem 

minf ^(Br,,(/.)B')?,-. 
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Let B denote the solution, then 

A=B'{BB')-\ 

A maximum lag H has to be chosen. This may be selected by estimating the co- 
variance matrices of the data rv(.) and taking H the minimum lag such that all entries 
of Ty{h), for h > H, are not significantly different from zero. Moreover, in order to 
recover the matrix A is necessary that {BB')^^ exists, therefore the solution matrix B 
should have full rank. Though it is easily seen that under model (12. Il l this method is 
consistent, its sample properties appear very hard to be devised. 

If model ( 13.2b seems more suitable to describe the data, the estimation methods 
proposed by (^ may be applied. 



6. Bias induced by the outliers on the estimates 

We turn now to consider the bias induced on the estimates of time-domain and frequency- 
domain indexes by the presence of an outlier 

Suppose that the observed time series {yt, t — 1, . . . ,T} contains an outlier at time 
tQ and size measured by the vector co. Let z, denote the unperturbed data. 



Zt 



yt if f 7^ tQ 
ytQ-o) if f = fo 



We compare the autocovariance or spectral density estimates computed on the actu- 
ally observed series with the corresponding unbiased estimates that would have been 
obtained from the unperturbed time series Zt ■ 
Note first that for the average y we have 

1 ^ 1 Y- 1 1 

y^jZ^yt = -2^zt + -co = z + -co, 

therefore ^ 

yt-y = zt-z- —co + coA,, 

where A, = 1 if f = fo and zero otherwise. If we denote 

the unperturbed autocovariance estimates (computed on the unobserved time series z,), 
the actual estimates may be written 

1 1 ^^'^ 1 1 

r (/! ) = - 1^ (yr - y ) (yr+A " y )' = - 1^ (Zr - Z - - CO + ffl A, ) (z,+,, - Z - - CO + CO A,+,, ) ' , 
^ r=l ^ 1=1 ^ ^ 

and a simple calculation gives 

f (0) = f (0) + ^ + i{(z,o - z)co' + co(z,„ -z-)'} - ^ 
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and 

t{h) - f{h) + - z)(o' + co{z„+h-z)'} 

- f2 + (oizt+h-z)'} - ^^(0(0'. (6.1) 

It follows that the difference between unperturbed and actual estimates is of order l/T 
in probability. We prove in Appendix |A. 91 the following theorem which states a more 
precise result. 

Theorem 6.1. Let {y^, t ^ I, . . . ,T} be part of a realization of a second order station- 
ary process with finite covariance matrices and suppose that at time tQ an outlier equal 
to 0) is added. Then if the fij{h)'s denote the covariance estimates and the fij(h)'s 
denote the correspondent estimates computed on the unperturbed time series, 

E{r(7,,(0) - 7„(0))} - (Or(0, + 0{T-') 

var{r(7„,(0) - 7„(0))} = (ohrr{Q) + a)27,,(0) + 2a),a),7„(0) + 0{T-^), 
and, for h ^0, 

E{T{%{h)-rUh))} = 0{T-') 

var{r(7,,(/i) - 7,,(/z))} = (O^YrriO) + (O^YsM +2(Os(Orrrs{2h) + 0{T-^). 

Results in the frequency domain may also be obtained, with the usual asymptotic 
approximations. On denoting by 

1 ^ 

d{>^) = -i^Y.^Zt-z)&^v{-i>^t) 
\J2%T 

the Fourier transforms of the unperturbed data, we may write 

d{l.) = (3'' - y) exp(-;Af ) = --^= ^ (zr - z + «Ar - y ) exp(-/Af ) 

= d{X) + --jl=(oexp{^iXto) - -/=Y E exp(-af)- 
If we consider only the Fourier frequencies ~ ^ j, the third term disappears and 

d{Xj) = d{Xj) + -^l=(ocos{XjtQ) - /-^^tosin(A/o), 
and simple calculations show that for the actual and unperturbed periodograms 



I{X)^diX)d{X) , I{X) ^ diX)d{X) 

we have 

I(Xj)^I{Xj) + ^+A{Xj)+iBiXj), 



R.Baragona and F. BattagUa/OutUers in dynamic factor models 



405 



where 



1 ^ 

= — Y,co^X{ta~t){o){zt-z)' + {zt-z)o)'}, 



and 

fi(^) - E sin^(fo - t){{,z,-z)(o' - (o{z,~-z)'}. 



1 ^ 



We prove in Appendix lA. lOl the following theorem which gives more specific results 
as regards each entry of the matrix of differences between periodograms. 

Theorem 6.2. Let {y,, f = 1 , . . . , T} /je part of a realization of a second order station- 
ary process with spectral density matrix -F(A) and suppose that at time fo an outlier 
equal to CO is added to the time series. Then if denotes the periodogram matrix 
andI{X ) denotes the periodogram matrix of the correspondent unperturbed series, for 
each I < r < s < N, 

ETRo{UXj)~UXj)} = ^ 

and 

ETlm{Ir,{Xj)-irs{Xj)}^0. 

Let s(T) be a sequence such that X [T) = 2lts{T) tends to X as T Then, as T 

y?.v^{T)Rt{Ir.{X{T))~UX{T))} ^ -^{co^f,,{X) + co^frr{^^ 

and 

vary(r)Im{/„(A(r)) -/,,(A(r))} ^ ±{(o^f,,{X) + co^f-riX)- a),a),(/,,(A) +/„.(-A))}. 

4-n 

The preceding results enable also to evaluate the bias induced by an outlier on spec- 
tral estimates. Let 

j=-T/2 

be a consistent spectral estimate where Xj = and wt{-) is a spectral window (see, 
e.g.. Il7h. and denote by F{X) the same form computed on /(A). We assume that for 
each T a truncation point M — M{T) has been selected such that 

and, as r ^ oo, M ^ oo but M/T 0. 
Theorem 6.3. Under our assumptions as T °° 

ET{F,,{X)~UX)}^^, 
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var{^Re(F,,(A) -F„(A) - ^)} 

^ ^{co}.f,,{X) + cojf,-,-{l) + a),a),(/,,(A) (6.2) 

var{ lm{Fr, ( A ) - F„. ( A ) ) } ^ ^ { co^^ ( A ) + w^/n- ( A ) - to, O), (/,, ( A ) + /,, ( - A ) ) } . 
Proof is in Appendix lA. 1 1 1 

The preceding discussion offers some arguments for choosing among estimation 
methods. The methods based on the variance-covariance matrix or on the spectral den- 
sity of the observed data have the drawback that the estimates may be biased by the 
presence of the outUer. The influence of the outlier on the estimates is of order 1/T 
and this would suggest that for large samples the two methods may perform equally 
well. Spectral methods, however, seem to constitute the favorite device to be used for 
checking dynamic factor model adequacy. The methods based on temporal decorrela- 
tion on the average are not affected by the presence of the outlier. This would suggest 
that these methods are likely to yield more reliable results. However a treatment of the 
sampling properties is not available and model adequacy has to be checked in any case 
to ensure that temporal decorrelation may be applied properly. Summing up, it seems 
that no estimation method may be declared the best one, and trying different methods 
seems advisable. 



7. Outlier identification and size estimation procedure 

Let Y be the observed data arranged as a matrix of rows and T columns. The entry 
Yit stands for the observation at time t of the i-th time series, t = 1,. . .,T and / = 
1,...,N. We assume that a dynamic factor model may be tentatively fitted to the data 
with unknown number of factors and possibly outlying observations of unknown size at 
unknown dates. The idiosyncratic component is unknown as well. Given Assumptions 
1 — 4 and the observed data set, model ( 12.11 ) is to be identified and estimated. 

A procedure for estimating the unknown components of model ( 12.1b and performing 
outlier identification and estimation may be described as follows. 

1. The optimal direction for locating the occurrence of outlying observations re- 
quires the computation of the variance-covariance matrix of the data rv(0) and 
of its smaUest eigenvalue A, i.e. 

f3,(0) = l(r-?)(F-?)' 

where Y =yl' . The x 1 array y is the components average computed over time 
and 1 is the all-one T x 1 vector Then the eigenvalue A and the associated x 1 
eigenvector z may be computed. The linear transform w — zfY yields the I xT 
time series that may be searched for the outlier occurrence dates. Let w and 
be the mean and standard error of w. Then the outUer is identified at time t if 
|w, I is the largest value such that |w, — wj > aa^ where a is a suitable constant. 
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Since this procedure is based on Theorem 13.31 it strictly holds only in the ho- 
moscedastic case, and is only approximately appropriate when such hypothesis 
does not hold. An alternative would be computing the eigenvector associated to 
a zero eigenvalue of one of the matrices Ty{h) for h > Q, for example ry(l), 
according to Theorem l3 . 1 1 and U?2\ 

2. If an outlier is detected, then the multivariate time series may be adjusted by 
interpolating (by multivariate linear interpolator) or forecasting (by vector au- 
toregressive (VAR) model) its value at the outlier date. Another strategy may 
consist in assuming that there is a missing value at the time of the outlier pos- 
sible occurrence. Anyway, the potential outlier is either replaced by its condi- 
tional mean or a missing value is assumed, and the outlier free estimates of 
variance-covariance and lagged covariance matrices may be computed up to a 
pre-specified maximum lag M. Robust estimation methods may constitute a valu- 
able choice. Either outlier free or robust estimates of the spectral density matri- 
ces have to be computed at the Fourier frequencies Ay, where Xj = Inj/T and 

; = -r/2 + i,-r/2+2,...,r/2. 

3. Checking conformity of the observed data to the dynamic factor model may be 
done by using the spectral density estimates. For instance, let the frequency in- 
terval < A < ;r be divided in 4 sub-intervals and assume for simplicity that T is 
integer multiple of 4. More than four sub-intervals may obviously be used, also 
according to the number of the data. If there is no reason for privileging any fre- 
quency components, equally wide sub-intervals may be selected. Symmetry con- 
siderations allow us to consider only the interval (0, n) instead the whole interval 
{—n,K). Then in each sub-interval {Xa+ 1 , A/, ) there are J ~T /A Fourier frequen- 
cies, i.e. {Aj, j = fl + l,fl + 2,...,fl+7 = /7}.In the ^-th sub-interval a — {t—\)J 
and b = U. The likelihood ratio test statistic U is provided by Theorem |4.2| The 
null hypothesis Ho that the covariance matrices are symmetric, i.e. the dynamic 
factor model may not be rejected, has to be checked by using the approximate 
statistic —mlogU (m = J — N—j) which, under //q, is distributed as a chi-square 
with A^^ degrees of freedom. 

4. Once the model has been found appropriate, the number of factors K has to be es- 
timated. A simple device is based on the eigenvalues of the variance-covariance 
matrix rv(0) that is rv(0) corrected for potential outliers. Let 

V = {Vi,V2,...,Vn} 

be the eigenvalues of rv(0) arranged in non increasing order and consider the 
cumulated sums 

Vl = Vi, V2=^Vi+V2, Vn^Vn-I + Vn. 

We assume K as the number of factors if it is the smallest integer such that 
Vk/Vn > 1 — OC, where a is a small real number, 0.05 for instance. That is, the 
number of factors is chosen so as the cumulated normalized eigenvalues of rv(0) 
exceed a fraction 1 — a that is judged large enough. We may want to take into 
account that the smallest eigenvalues may be greater than zero. So a better ap- 
proximation to the correct number of factors may be obtained by assuming K as 
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the smallest integer such that {Vk+ {N — K)Xmin\ I^n > 1 — «. More information 
may be used for choosing K along these same guidelines by using the eigenvalues 
of the (corrected) spectral density matrix at frequency zero which is essentially 
the sum of the covariance matrices for lags — M, — M + 1 , . . . , 0, 1 , . . . , M. The 
covariance matrices may be used as well separately, and so the spectral matri- 
ces at non zero frequencies. However, different matrices may lead to different 
estimates of K though the same value is to be expected in most cases. 
5. Estimation of matrix A and factors X allows the dynamic factor model to be 
specified completely. In addition, both A and X are needed to estimate the outlier 
size and to distinguish if the factors, the model or both are affected by the out- 
lying observations. Several methods are available, for example we may list the 
following ones. 

(a) Temporal decorrelation, i.e. a matrix A not necessarily squared nor orthogo- 



nal may be computed by approximate joint diagonalization (|24|) of matrices 



Tyih), h — 1 , . . . Let the N x K matrix A be such matrix, that is 

ty{h)=ADi,A', h^l,...,K, 
where the D/,'s are diagonal K x K matrices. Then we may let 

B^iA'Ay^A' 



and 

It follows 
for 



X=BY. 



Bty{h)B' = t:,{li) ^Di, h = l,...,K 



r^{h) = ^XX' = ^BYY'B' = Bry{h)B' = Dh. 



(b) Assuming, for th e purpo se of estimating A, that the factors are not random, 
as suggested by (21tl22h. the method outlined in Theorem l5.1l mav be used, 
which only requires the K eigenvectors associated to the K largest eigenval- 
ues of the matrix YY' be computed. Assuming Wk the N x K matrix whose 
columns are the column eigenvectors and Ak the diagonal matrix with the 
largest eigenvalues of YY' on its diagonal, the estimate of A is given by the 

1/2 

simple formula A = W/f A^' . 

(c) Methods developed in the factor analysis framework may be used. Consider 
again the log-likelihood of the dynamic factor model 

NT T 1 

L = - — log27r - - log I - -ti-{ (Y - AX)'E^ ' (F -AX)}. 

Let us assume at this stage that the matrix Ejj is known. Maximizing the 
likelihood with respect to matrix A only yields 

A = Ef e(Aj,-/j,)i/2, 
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where A.k is the diagonal matrix with the K largest eigenvalue of the matrix 

— 1 /2 ~ —1/2 

Ty {Q)^r\ on its diagonal and Q is the matrix whose columns are the 
corresponding eigenvectors. On the other hand, given A, the likelihood is 
maximized by letting E,j = diag(f ^ (0) —AA'). By substituting A to A we 
have the approximate formula 

E^=diag(r,,(0)-AA'). 

Given an appropriate starting value for we may apply iteratively the 
formulas that yield A and E,j until some convergence criterion is met. For 
instance, the algorithm may stop if the difference between two consecutive 
values of the maximized log-likelihood is less than a pre-specified tolerance 
constant. Proofs of formulas are provided by , pp. 367-370, who warn 
that this method does not guarantee convergence. Nevertheless the algo- 
rithm is simple and effective in most cases, and the entries on the diagonal 
of Ep are not constrained to be all equal. 

6. Finally the outlier size o may be estimated. Note that i,Q may possibly include 
a within-factor outlier a. For the static model, if an outlier has been detected at 
t = fo and given A and X, we may write the log-likelihood of r\t(, 

L = - 1 log(2;r) - 1 log |AA' + E^ I - i {y„ -Ax„ - C)' (AA' +E^ ) - ' - Ai,„ ~Q. 
Maximization of the UkeUhood with respect to ^ yields 

l=yto - Aim- 

To estimate the outlier size (a we will obtain an estimate a so that 

m^Aa + t,. (7.1) 

According to ( 17. Il l the vector & may be written as the sum of a vector that is 
obtained as a linear combination of the columns of A and a vector which is or- 
thogonal to the space spanned by the column of A. The outlier a impacts the 
factors while ^ impacts the model structure as a whole. We may estimate the 
size of a for each one of the components x\,. . . ,xk essentially by building the 
linear interpolator of each i/f^, i — 1, . . . ,K, based on values observed at f 7^ to, 
and identifying an outlier at each time when the interpolator is too different from 
the estimated i,Yo (see e.g. 13, for details). 

As an example, let us simulate T = 100 observations of the multivariate x 1 time 
series {y,} with N ~5 generated by the model 

y, ^Ax, + coA, + rit. 

We assume K = 4 factors, so that the matrix A has 5 rows and 4 columns. Let 
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Fig 1. Plot of eigenvalues of the observed time series spectral matrix at frequency zero 

The factors have been simulated according to the VAR model 

where <I> = diag(.7, — .7, .7,— .7) and {e,} is normal white noise with zero mean and 
variance ~ 1 —0.7^. This ensures that the (theoretical) variance of each factor is 
unity. In addition, as <I> is diagonal, and by normality assumption, (theoretical) factors 
are independent. The idiosyncratic component {tj,} is assumed a zero mean normal 
white noise sequence with variance = 0.04. The outlier was located at f = 100, 
that is at the end of the series. Most outlier detection methods are not able to discover 
potential outliers at the end (or beginning) of the observed time series. 

Outlier size was (0 — (1.5, — 1,0,— 4,5)'. Each component of the generated time 
series yt has (theoretical) variance equal to 1 .29, excepted the first and the last one that 
have variances 1 .04 and 0.29 respectively. The outlying observation is rather large only 
compared to component series 4 and 5, in the remaining cases the outlier size does not 
exceed twice the standard error of the component series. The standard errors of the 
simulated time series {y,} with outlier are (1.0970, 1.2858, 1.1967, 1.1906,0.7415). 

The assessment of the number of factors has been performed by examining the 
eigenvalues of both variance-covariance and spectral density matrices. In Fig. [T] the 
cumulated eigenvalues of F{Q) are plotted. It has to be noticed that the smallest eigen- 
value is greater than zero, so that the threshold that serves as a decision rule about the 
number of factors has been computed accordingly (see point 4 above in this Section). 
This way the correct number of factors K = 4 may be identified. 
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Fig 2. Univariate time series obtained by projecting the multivariate one along optimal direction 

The optimal direction for detecting outlying observations has been computed 

z = (-0.5684,0.4436,-0.3448,-0.1664,0.5775)'. 

The univariate time series obtained as the linear combination {z'y,} is displayed in 
Fig.|2] The outlier at f = 100 is clearly highlighted. Mean and standard error of {z!yt} 
have been computed equal to 0.0902 and 0.6463 respectively. The standardized value 
at to — 100 results equal to 4.7738, larger than the Tchebychev upper bound 4.47 which 
corresponds to the 5% level. 

A VAR has been estimated for the observed time series {y?} and the one-step-ahead 
forecast has been taken to replace the last observation. The corrected time series was 
then used to compute the variance-covariance matrix, the covariance matrices at lags 
1 — 4 and the spectral density matrices for 100 Fourier frequencies from —n to n. For 
checking that the estimated covariance matrices could be assumed symmetric, the in- 
terval (0, n) has been divided in 4 non overlapping intervals, each of which included 25 
frequencies. We obtained for the test statistic the values 10.64, 8.93, 8.77 and 12.32, 
with 25 degrees of freedom: the critical value at the 5% level is 37.65. As it is greater 
than the computed statistics, we may not reject the dynamic factor model hypothesis. 

The estimates A and X have been computed by using the three techniques described 
in this Section, that is approximate temporal decorrelation, eigenvalues-based and iter- 
ative maximization of the likelihood function. The latter two methods are similar and 
yield indeed similar results. The first method is an entirely different approach that takes 
explicitly into account the covariance matrices at higher lags. 

Nevertheless, the estimated dynamic factor model fits the data quite well no matter 
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Fig 3. Observed time series (blue line) and estimated dynamic factor model (red line) 



what method has been used. We report only the plot of the simulated time series (blue 
line) along with its estimate yielded by the approximate temporal decorrelation algo- 
rithm (red line). Other methods yield estimates that overlap almost exactly. In Fig.[3]the 
observed (simulated) and forecasted (estimated) series are plotted for each component. 
We may notice that the outlier is not generally apparent by the visual inspection of the 
graphics. 

Then the outlier size has been estimated as the sum of the two components, the first 
one in the dynamic factor model and the second one in the factors. The two components 
are orthogonal to each other. Also the first one is orthogonal to the space spanned by 
the columns of A while the other one impacts the dynamic factor model as a linear 
combination of the columns of A. The estimates are displayed in Table [T] The three 
methods yield similar estimates of the total outlier size d) and of the component t, that 
impacts the overall model. The sizes of the outlier a within the factors differ because 
the estimated factors themselves depend on the matrices A estimated by each of the 
three methods. The differences are small, however, if we compare the arrays Aa. 

The results reported in Table[T]seem reliable as regards the recovering of the outlier 
size (£). We may compute from the 'true' outlier size (£> the arrays 



a = (1.161,-0.903,-0.903,-0.839)' 
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Table 1 

Outlier size estimates yielded by temporal decorrelation, eigenvalue analysis and maximum likelihood 

methods 





temporal 


eigenvalue 


maximum 




decorrelation 


analysis 


likelihood 




0.0944 


0.2326 


0.6805 




-0.0534 


-0.6887 


-1.1582 




1.2871 


1.4950 


1.9913 




-2.8438 


-2.7532 


-2.8597 




5.4193 


5.3500 


4.9343 


a 


0.4381 


1.4664 


0.9291 




-1.5874 


1.1927 


1.8030 




1.2749 


-0.4097 


0.9445 




-0.1808 


0.0576 


1.2409 


Aa 


1.1473 


1.0374 


0.6822 




-1.0463 


-0.9231 


-0.5516 




-2.0511 


-2.2148 


-2.6060 




-1.1319 


-1.2080 


-1.1289 




-0.2297 


-0.1667 


-0.1739 


a 


1.2417 


1.2701 


1.3627 




-1.5793 


-1.6118 


-1.7098 




-0.7640 


-0.7198 


-0.6147 




-3.9757 


-3.9613 


-3.9886 




5.1896 


5.1833 


5.1082 



and 

C = (0.3387,-0.6774,1.3548,-2.7097,5.4194)'. 

This latter is close to its estimated counterpart (in each of the three versions). As far as 
a is concerned we have to consider that the product 

Aa = (1.1613,-0.3226,-1.3548,-1.2903,-0.4194)' 

is close to Aa. 

8. A simulation experiment 

We performed a simulation experiment by replicating 1000 times a dynamic factor 
model and applying three methods for outlier detection and estimation. This way we 
wanted to test the effectiveness of the method that we are proposing in this paper (let us 
call it ODFM). Then, we made a comparison with two methods that were available for 
detecting and estimating outliers in multivariate time series. The first one was proposed 



by ( l23h to detect and estimate four types of outliers in multivariate time series modeled 
by a vector autoregressive integrated moving-average (VARIMA) model (let us call it 
OARMA). The second one was proposed by (8) as a projection pursuit approach to 
detect and estimate four types of outliers in multivariate time series not necessarily 
generated by a VARIMA model (let call it OPP). 

We confine our attention only to the most common types of outliers, namely the 
additive outliers (AO). An AO impacts the series only at the time of its occurrence, 
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while neighboring observations remain unaffected. Other outlier types were defined in 
the literature: innovation outliers (10), level changes (LS) and temporary changes (TC). 
However, lO's are only defined when the data are assumed to follow a VARMA model, 
which is not our case. LS's arise when the mean levels of each component series change 
at once, and then remain constant. They are equivalent to AO in the difference, and may 
be identified by analyzing the differenced data. Finally, a TC in multivariate time series 
data is defined at f = fo if ^ constant (£> which defines the outlier size is added to jt^ and 
5*0 is added to yt^+h k> 0, where < 5 < 1 is a scalar constant. We feel that a TC is 
a very unlikely behavior in real data, in any case it is easily identified by the existence 
of an exponentially^caying impulse at rate 5 in the univariate projection series z!yt. 

The method of (l23i) assumes that the (A^-dimensional) multivariate time series {yr} 
may be modeled as 

y,=x, + aiB)(o^r^''\ 

where the unobservable multivariate time series {x,} is generated by the VARIMA 
model 

^{B)x,=c + @{B)e,. 

In the latter equality, 

^{B) =I-^iB- ...-^pBP 

and 

©(B) =/-0iB-...-0^S'? 

are N xN matrix polynomials of finite degrees p and ^, c is a A^-dimensional constant 
vector, and {e,} is a sequence of independent and identically distributed normal random 
vectors with zero mean and covariance matrix Eg. Some assumptions are needed to 
ensure that 

X, = +<I>(B)"^©(B)e, = + ^(B)e, 

is a well defined moving average model. Then, a{B) — ^{B) defines an lO and a{B) = 
I an AO. The date of the outlier is defined by the binary variable 4^' ' which equals 1 if 
t — h (that is, the outlier occurs att = h) and otherwise. 

The method of (8) aims at discovering the univariate projections of the multivariate 
time series that best highlight the presence of outliers. The directions that yield the 
most useful projections are given by the projections that either maximize or minimize 
the kurtosis. Moreover, orthogonal directions are to be taken into account as well. The 
number of projections to be examined for outlier detection is 2N, where denotes the 
dimension of the multivariate time series. If lO's have to be detected, then the method 
applies to the residual multivariate time series computed from a suitable model fitted 
to the observed data. 

We simulated 200 observations from the A^-dimensional dynamic factor model 

y, =Ax, + coA, + r]t. 

We assumed N = 20 and K = 4 factors, simulated according to the VARMA model 



X,-<I>X,_i =£,-©£,_!, 



(8.1) 
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where O = diag(0.7,-0.5,0.5,-0.7), = diag(-0.5, 0.7, -0.7,0.5) and {e,} is nor- 
mal white noise with zero mean and variance-covariance matrix Eg = //f — ©<t>' — (<t> — 
0)0' . This choice of Eg ensures that the (theoretical) variance-covariance matrix of 
{xt} is the unit matrix. The matrix A was chosen to have 20 rows and 4 columns. We 
let 



A = 
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The matrix A may be verified to have fuU rank. The (theoretical) factors are inde- 
pendent because both 3> and are diagonal matrices and the {ef}'s are uncorrelated 
normal random variables. All idiosyncratic components {t],} were assumed zero mean 
normal with variance C7^ = 0.04. 

We checked two outher configurations. The first one was an isolated multivariate 
outlier t = 100. The second one was a patch, that is a sequence of neighboring 
outliers at t = 99, 100, 101. The size of each and every outlier was chosen equal to 
0.6. This figure was chosen in comparison with the standard error (7jj = 0.2 of each of 
the idiosyncratic components. The total multivariate outiier size is the A^-dimensional 
vector (O with entries all equal to 0.6. The outlier CD may be spUt in a term ^ orthogonal 
to the columns of A, that is 

C = (/-A(A'A)-U')a), 
and a term A a which is a Unear combination of the columns of A with coefficients 

a = (A'A)-U'©. 

The coefficients a may be thought of as the sizes of an outlier that impacts the factors. 

All computations were performed by using the Matlab package. For each of 1000 
replications we generated 300 independent identically normally distributed /T-variate 
random vectors and 200 independent identically distributed A^-dimensional random 
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Table 2 

Percentages of estimated number of factors in the presence of outliers 



Outlier 


Estimated number of factors 


type 


K = 3 


K = A 


isolated 


0.3% 


99.7% 


t= 100 






patch 


0.2% 


99.8% 


f = 99,100,101 







vectors all with mean zero and variance-covariance equal to the unit matrix. From the 
300 /T-dimensional random vectors 300 observations were generated from the ARMA 
model (18.1b . Then, the data were transformed as explained before, to obtain unit vari- 
ance factors. The first 100 ^T-dimensional data were discarded to remove the effect of 
the (random) initial values. As a result, a /T-dimensional factor time series of length 
200 was obtained. The A^-dimensional white noise was pre-multiplied by the inverse of 
the square root of the matrix Erj . This was the artificial idiosyncratic component that 
was added to the factor data. Finally the two outlier structures were superimposed to 
the artificial data generated from the dynamic factor model. In each replication, and 
for each of the two outlier structures, the methods ODFM, OARMA and OPP were 
applied for outlier detection and estimation. The usual Monte Carlo simulation proce- 
dures were used to compute the percentages of both correct and false identifications 
and the average and standard errors of the estimates. A synthetic measure of the dis- 
tance between the estimated and true outlier was obtained by computing the norm of 
the vector difference between the estimated and true outlier size. 

In the present context it seems of interest to report some results concerned with the 
validity of a dynamic factor model to fit the data. We divided the frequency interval 
(0,7r) into four sub-intervals of equal size, namely |], [f + y-, f ], [f + |?r], 
and [jK+^.k]. Each sub-interval included r/8 — 1 frequencies (here 24 frequencies 
as r = 200), and the sub-interval centers were |;r, ^K, and respectively. The 
null hypothesis, Hq. variance-covariance matrices are symmetric, tested using the LRT 
statistics of Theorem 14.21 was never rejected at significance level 5% neither in the 
presence of an isolated outlier nor an outlier patch. 

Table[2]shows that the number of factors [K — 4) was correctly estimated in almost 
all replications. 

For method ODFM we computed the N — K univariate projections obtained as linear 
combination of the multivariate time series 

w/,, = (vO'yr, i^l,...,N-K, 

where v,- is the eigenvector of the variance-covariance matrix ry{Q) of the observed 
multivariate time series associated with the eigenvalue A,. The eigenvalues were ar- 
ranged in ascending order, that is the eigenvectors vi , . . . , vn-k belong to the smallest 
eigenvalues Ai , . . . , Xn-k respectively. Then, the presence of an outlier in the multivari- 
ate time series {y,} was detected at time t if 
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for some (that is, at least one) 1 -dimensional time series {w/.r}- Wi and On-j were the 
Wi,t sample mean and standard deviation respectively. The threshold parameter ka may 
be computed according to the Tchebychev inequality. We chose the significance level 
a = 0.05 so that approximately ka =4.47. 

The parameters of the dynamic factor model were estimated along the guidelines 
given by Theorem lS.ll in Section |5] (see step 5(b) in Section|2]as well). The difference 
between the observed time series and the estimated dynamic factor model values was 
assumed to yield the estimate of the outlier size ^. Then outliers a were estimated in 
each factor by using Formula (2.2b) p. 194 of (Q)- The total outlier size was obtained 
as o) = ^+Aa. 

The OARMA method was implemented along the guidelines given by (23). A VAR 
model of order M = 4 was fitted to the multivariate time series {yt}. We assumed that 
only outliers of either additive or innovation type could be present. The Mahalanobis 
type statistic for either type of outliers 

Ji.h = {0}i,h)'^ll0)i^h 

was computed for each time t = h and / = / for lO and ; = A for AO. H,- /, denotes the 
covariance matrix of the estimator If the maximum across time of 7^ /, or 7/ /, exceeded 
their respective 95-th percentile, then either an AO or an lO was assumed at /z = /imax 
according to which statistics Jmax{i,hi) = niax/,7, /,, / = I, A, was the greatest. Then the 
outlier size was estimated and its effect removed from the multivariate time series. The 
procedure was iterated until no more outliers were found. Tables of percentiles of the 
statistics Jmaxil,hi) or 7max(-A,/iA) are available only up to dimension 10 (see (|23|) . 
Table 1 p. 797, and ^, Table 4 p. 664). So we computed empirical percentiles from 
10000 artificial multivariate time series generated by the dynamic factor model with 
A? = 20 and r = 200. We obtained JmaMJ^A) = 47.7911 and J,^^^{I,h,) = 46.6493 at 
the 5% significance level. 

The estimates of the outlier size were computed by using the Formulas (»/ /, for 
innovation outliers and 0)^,/, for additive outliers provided by (23) p. 794. 

As far as the OPP method is concerned the direction that maximizes the kurtosis of 
the linear projection of the multivariate time series {yt} and all orthogonal directions 
had to be computed. The direction that minimizes the kurtosis and its orthogonal direc- 
tions had to be computed as well. To compute these 2N projections we used the Matlab 
routines by (Il5b available on the web (http:// halweb.uc3m.es/fjp/download.html). In 
this case too we confined ourselves only to AO and lO. In this latter case, the procedure 
was applied to the multivariate residual time series {a,} computed by fitting a VAR of 
order M = 4 to {yt}- Then, each and every projection was searched for outliers by using 
the univariate counterpart of the statistic in the OARMA method. The maximum across 
time and across projections was computed and let A^ denote the maximum found on 
the projections of the multivariate time series {yt} and A/ denote the maximum found 
on the projections of the multivariate residual time series {a,}. Both A^i and A/ were 
compared with their appropriate thresholds and either an AO or an lO was detected if 
the greatest of A^i and A/ exceeded its threshold. As tables of percentiles of Aa and 
A/ are available only up to = 10 (see (8), Table 2, p. 663), we computed the 95-th 
percentiles by simulation from the dynamic factor model with N = 20 and T = 200. In 
this case we obtained after 10000 replications A^ = 5.9772 and A/ = 6.0392. 
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Table 3 

Isolated outlier percentage detection by the ODFM, OARMA and OPP methods 



ODFM 


OARMA 


OPP 




outlier id. percent 


outlier id. percentage 


outlier id. percentaj 




correct false 


correct AO correct 10 false 


correct AO correct 10 


false 


96.9 1.7 


2.56 94.64 100 


58.2 40.9 


4.3 


(3.5199) (2.3473) 


(2.21) (2.99) 


(8.1462) (8.2698) 


(4.2320) 



Once date and type of outliers were determined, to estimate their size we fitted a 
VAR model of order M = 4 to the multivariate time series {yt} and computed O)/,/, for 
innovation outliers and oJam for additive outliers as in (I23I) p. 794. 

The results obtained by applying the three procedures to discover the outlying ob- 
servation in the artificial data set with an isolated outlier are displayed in Table [3] The 
Monte Carlo statistics were computed on 1000 replications. In order to compute the 
standard errors of the estimates of the percentages of detection, we divided the replica- 
tions in 40 groups of length 25 each. The percentages were computed for each group 
of replications, then we could compute the standard error of each percentage by using 
40 values. The percentage of correct detection of the isolated outlier in f = 100 was 
greater than 95% for all methods. The standard errors are comparable as regards meth- 
ods ODFM and OARMA but the standard error is slightly larger for the OPP method. 
Both OARMA and OPP methods identified in most cases an outlier of innovation type 
possibly because the 10 allows for greater flexibility as far as fitting the observed data 
is concerned. On the other hand, if we constrained the OARMA and OPP methods to 
search for AO only, then the percentages of correct detection dropped dramatically. We 
recorded false outlier detections as well. The number of replications (percentage) where 
one or more wrong detections occurred was low for methods ODFM and OPP while 
the method OARMA wrongly detected outlying observations in every replication. As 
overall figures, the observations detected as outlying ones over 1000 replications (se- 
ries length 200) were 17, 80, and 9003 for the ODFM, OPP and OARMA methods 
respectively. 

The results for the data set with a patch at times t = 99, 100, 101 are displayed in 
Table ID Considering the outlier separately, the ODFM method detected each of the 
three outliers about 99%, while for the OARMA and OPP methods high percentages 
were recorded only for the detection of the first outlier in the patch. Again, these latter 
methods almost always identified the outlying observations as lO. As a consequence, 
the outliers in f = 100 and f = 101 could be well explained by the innovation outlier 
structure. The percentages of false identifications were rather low for the ODFM and 
OPP methods, while the OARMA method wrongly identified outliers in all 1000 mul- 
tivariate time series. The average number of false outliers was in this case about 7, that 
is less than in the case of multivariate time series with an isolated outlier. By consider- 
ing the patch as a whole, correct identification was performed 97.5% by ODFM, only 
about 58% by OARMA and never by OPP. At least an outlier in the patch was detected 
by ODFM, OARMA and OPP 100%, 85.09% and 85.4% respectively. 

Outlier size estimates are displayed in Table |5] for the case of the isolated outlier 
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Table 4 

Outlier patch percentage detection by the ODFM, OARMA and OPP methods 



Method ODFM OARMA OPP 

id. percent iden. percentage iden. percentage 







AO 


lO 


AO 


lO 


Outlier 


98.9 


0.1 


83.09 


0.0 


82.9 


t = 99 


(1.9975) 


(0.24) 


(3.79) 




(7.9492) 


Outlier 


99.8 


0.2 


63.72 


0.0 


1.4 


t= 100 


(0.8718) 


(0.42) 


(4.79) 




(2.1071) 


Outlier 


98.8 


4.34 


60.06 


2.9 


0.0 


/= 101 


(2.7129) 


(2.23) 


(4.69) 


(3.3451) 




False 


1.5 




100 




4.8 


detection 


(2.1331) 






(3. 


9192) 



and in Table |6] for the case of the outHer patch. The adequacy of the estimates was 
evaluated by the norm of the difference between the estimated and true outlier size 
vectors. This figure yields a kind of distance measure which accounts both for bias 
and variability of the estimates. We may remind that the outlier size is equal to 0.6 
for all component time series. For the ODFM method we distinguish the total outlier 
from its orthogonal part. This is of interest because the dynamic factor model structure 
is directly impacted by the array orthogonal to the columns of the matrix A, while 
the rest impacts the factors. For the other methods, OARMA and OPP, we distinguish 
between AO and lO identification. We may notice that as far as the ODFM method is 
concerned, the orthogonal part is close to its true counterpart while the estimated total 
outlier seems more variable. The distance is about 7 in all cases. As regards the other 
methods OARMA and OPP the distance between the estimated and true outlier size is 
smaller, approximately between 4 and 5. In the case of OPP this figure is even smaller 
(about 2.5) if the outlier is identified as AO. 

Such distances depend both on bias and variability. To distinguish between these 
two sources, we present the bias and standard errors of the estimates of O; in Table [T] 
for the case of isolated outlier. Figures are bias and standard errors of estimated outlier 
sizes in each component, over the correctly identified replications, and averaged on 
the 20 components. It may be seen that the proposed ODFM method provides less 
biased estimates of the outlier size, while the variability is larger than the projection 
pursuit method. Since the distances of the estimated t, from the true values are generally 
small (see Table |5] and |6l), we conclude that estimation of univariate outliers in the 
(estimated) factors, or interaction between them and the estimates of the factor matrix 
A, are responsible for such a larger variabiUty. This may be caused by the method 
employed in Theorem 15. II for estimating {x,}. A more efficient method is currently 
under investigation. 

9. Application to real data 

We used a quarterly economic data set which covers three countries, France, Germany 
and Italy, from 1960 to 1999, for illustrating method ODFM. The data are taken from 
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Table 5 

Deviation from true values of isolated outlier size estimates yielded by the ODFM, OARMA and OPP 

methods 



Method 


ODFM 


OARMA 


OPP 






K-Q 


\\&-(0\\ 


W&A-aW 


\\a),-w\\ 




11 01/ -0)11 


Outlier 


0.7805 


7.06 


3.3778 


4.5135 


2.5843 


4.3167 


f = 100 


(0.1364) 


(2.8675) 


(1.4531) 


(3.4293) 


(0.6581) 


(1.5351) 



Table 6 

Deviation from true values of outlier patch size estimates yielded by the ODFM, OARMA and OPP methods 



Method ODFM OARMA OPP 









ll«A-co|| 


||«7-(0|| 


W&A-COW 


II 01/ -0)11 


Outlier 


0.7817 


7.1861 


3.9729 


4.5533 




4.3767 


t = 99 


(0.1386) 


(2.8408) 


(1.29) 


(1.6458) 




(1.5117) 


Outlier 


0.7829 


7.1386 


3.9406 


6.1979 




4.6051 


? = 100 


(0.1381) 


(2.8656) 


(1.1391) 


(2.4194) 




(1.0134) 


OutHer 


0.7773 


7.2700 


4.0333 


5.1087 


2.4556 




t = m 


(0.1415) 


(2.7589) 


(1.4710) 


(1.9191) 


(0.4919) 





Table 7 

Bias and standard errors of the outlier size estimates, averaged on the 20 components (isolated outlier at 

t = lOOj 



Method 


ODFM 


OPP-AO 


OPP-IO 


correctly identified repl.ns 


969 


582 


409 


average bias 


0.014 


-0.373 


-0.231 


average standard error 


1.70 


0.46 


0.99 
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French (fl, German (g) and Italian (i) quarterly economic data and preliminary transformations 



Label Description Transform. Countries 



cpi 


Consumer price index 


A^ln 


f g 


ip 


Index of industrial production 


Aln 


f g 


Pgdp 


Gross domestic product deflator 


A- In 


g 


rbndl 


Interest rate of long-term government bonds 


A 


f g 


rbndm 


Interest rate of medium-term government bonds 


A 




rcommod 


Real commodity price index 


Aln 


f g 


rgdp 


Real gross domestic product 


Aln 


g 


roil 


Real oil prices 


Aln 


f g 


rgold 


Real gold prices 


Aln 


f g 


rstock 


Real stock price index 


Aln 


f g 


unemp 


Unemployment rate 


A 





the seven-country data set used by (1221) to compute combination forecasts of output 
growth. The preliminary transformations suggested in the aforementioned paper were 
applied. Then, 154 transformed data for each time series were available from III quarter 
1960 to IV quarter 1999. We considered only the time series indicated by 'a' in Table 
lb in dH. 

For France 7 series were available from 1960 to 1999, 9 series for Germany and 
1 1 series for Italy. Time series labels, description, transformations and countries are 
displayed in Table |8] 

The data set was composed of 27 time series of length T = 154. The number of 
factors was checked by using the eigenvalues of the variance-covariance matrix. The 
eigenvalues were computed and arranged in descending order The cumulated sum is 
plotted in Fig.|4] The smallest integer such that the cumulated sum exceeded 0.95 was 
6, so that we assumed the number of factor K — 6. 

The symmetry of the variance-covariance matrices computed for lags 1, 2, 3 and 4 
was checked along the guidelines given in Section |4] Two statistics for this test were 
computed by averaging the periodogram in two frequency bands, centered in A = 7i;/4 
the first one and in A = 37r/4 the second one. We obtained respectively 174.81 and 
143.88 which are not significant, so that the null hypothesis was not rejected. 

The identification stage required the computation of N — K = 14 projections that 
were searched for outlying observations along the guidelines given in Section |8] The 
projections that exceeded the thresholds computed by the Tchebychev inequality sug- 
gested the presence of outliers at times t ~ 23,24,37,63, 124. The graphical display of 
these projections in Fig. |5] shows that each projection disclosed an outlier separately, 
excepted the third projection (outliers at f = 23 and t = 24). 

Examination of the estimated outlier sizes allowed the time series that most con- 
tributed to the multivariate outlier to be discovered. We could see that the outlier at 
f = 23 (1/1966) was apparent in time series rbndl (g) and pgdp, rbndl, rstock, unemp 
(i). The outlier at f = 24 (11/1966) was mainly determined by roil (f), roil and rstock 
(g), and pgdp, rbndl, rstock, unemp (i). The outlier at f = 37 (III/1969) was mainly 
determined by rbndl (f), rbndl (g), and rbndl and unemp (i). The main influence on the 
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Fig 5. Projections of the multivariate time series where the presence of outliers may be detected 
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outlier at r = 63 (1/76) came from rbndl and rgold (f), rbndl and rgold (g), and rbndl, 
rbndm, rcommod, roil (i). The outlier at f = 124 (11/1991) was apparent only in time 
series rbndl (f). The interest rate of long-term government bonds (rbndl) was present 
in all outliers, that is this time series produced most of the unexpected values observed 
in the multivariate time series. Its effect was common to the three countries at f = 37 
and / = 63 while its effect was Umited to Germany and Italy at f = 23, Italy at / = 24, 
and France at ? = 124. Some time series were available for Italy only, so it seemed of 
interest to apply the procedure to the data set which included only the quarterly eco- 
nomic ItaUan data. We could fit a dynamic factor model with 3 factors and 4 outUers 
were found, at f = 23 (pgdp, rbndl, rbndm, rstock), t = 2A (pgdp, rbndl), ^ = 55 (rbndm, 
roil, rgold, rstock, unemp), and t = 103 (rbndl, rbndm, roil, rstock, unemp). Observa- 
tion at / = 55 corresponds to 1/1974 and t = 103 to 1/1986. By comparing the results 
for Italy with those for the three countries together we could argue that some outliers 
were 'international' while others regarded only one or two countries. So we had to 
consider outUers at / = 23 and f = 24 as 'international' while outUers at / = 55 and 
f = 103 as 'national'. Some of the time series that originated these two outliers, that 
is rbndm and unemp, were present in the Italian data set only. Similarly we could not 
consider the outlier at / = 124 as 'international' but 'national' limited to the quarterly 
economic French data. Note that the outlier at ? = 23 was not considered present in 
the French data, but dates f = 23 and r = 24 are close so that the outlying observations 
were possibly related to some common circumstance. 

10. Conclusions 

We presented a method to discover outliers in multivariate time series generated by a 
dynamic factor model. This method was found to yield best results compared to two 
other methods aimed at discovering outliers in multivariate time series in a different 
framework. Our method relies on the assumption that the multivariate time series is 
generated by a dynamic factor model, therefore the procedure to check the dynamic 
model adequacy for fitting the data should be carefully applied to ensure that genuine 
outUers could be discovered. If assumptions were carefully checked and requirements 
met, both the simulation experiment and the application to real data showed that the 
method presented in this article was effective for outlier detection and estimation, cau- 
tious against false outUer identification and, in addition, simple to implement. The es- 
timates of the total outlier size were found less biased, but more variable than those 
obtained by using the other two methods. On the contrary the estimates of the size 
of the part of outlier that impacts the dynamic factor model without affecting the fac- 
tors were found accurate and close to their respective true values. Improvements of the 
estimation method is currently subject of further research. 
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Appendix A: Appendix 
A.l. Proof of Theorem \2.1\ 

Proof. Suppose that x* ~ Cx, and Assumptions 1 and 2 hold, then 

X: = f 

where 0* = C&j and 0, = diag(0|p, . . . , 9^^^). But since the x*'s are mutually inde- 
pendent, each y = 0, 1, . . ., is diagonal, therefore Y^kCskSlr^ = 0, r ^ s. The 0/s 

are diagonal, therefore c^rBrP = 0, r ^ s. Since for each r at least one 0,i is non-zero 
(otherwise x, , would be equal to zero for each and every t) it follows that C too has to 
be diagonal. Thus, again from Assumption 1, 

r*(o) = cov(x;, (jc,*)') = cr,(o)c' = cc' = i, 

that is c|^, = 1, k= 1,. . . ,K, which implies C = / up to a change of sign. □ 
A.2. Proof of Theorem 13.71 

Proof Note first that, for h ^ 0, Tyih) = Ari(/z)A' has rank < K, therefore Tyi^h) has 
at least (A^ — K) eigenvalues equal to zero for any h ^0 (zero is not necessarily the 
smallest eigenvalue because ry{h) is not positive definite). 

If z'A = 0, then ry{h)z — Arx{h)A'z = 0, therefore z is eigenvector associated with a 
zero eigenvalue of each matrix r,.(/i) for any h ^ 0. 

On the other hand, let z be an eigenvector associated with a zero eigenvalue of each 
matrix Ty(\i\h. ^ (the set of such vectors is not empty because it includes V^). 
Then Yy{h)z = AYx{h)A'z = for any h and on multiplying by the generalized inverse 
(A'A)" 'a' we get Tx{h)A'z = 0. If we let c — A'z the last equation reads Tx{h)c = or, 
since Txih) is diagonal, 

Y^ih)cj = 0, j=l,2,...,K,h = l,2,--- 
which under our hypotheses implies cj = Q J = 1,2, ... ,K and therefore A'z = 0. □ 

A.3. Proof of Theorem \3.2\ 

Proof. We note first that 

ry(/?) = VuW'u+h , = 1,2, . . . ,S- 1 
11=1 

and Ty{h) = foi h > s. Moreover, the matrix [i/^i , 1/^2, ... , i/Av] has rank < sK{< N), 
thus there exists a vector z such that z'xj/u — for any u. For such a vector we have 

ry{h)z - r„ ( v/:+,z) =o,h=i,2,...,s-\ 
11=1 
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and it also follows that '^y{h) has a zero eigenvalue for each /i=l,2,...,s— 1. 
Suppose now that z is eigenvector associated with a zero eigenvalue of each TyQi), and 
write C(, = v/j,z. Then 

= r,. {s-l)z=Wi WsZ = Vi 
and since the rank of y/i is K it follows c, = 0. Now, 

and Cj_ 1 =0. On repeating the same argument for lags s ~ 3,s — 2,. . . ,2,1 we obtain 
Cj_2 = Ci_3 = . . . = C2 = 0. It remains to show that ci —0 also. From assumption (ii) 
for a fixed A: the rank of xj/i^ is K, and 

u=\ 

and it follows ci — 0, which completes the proof. □ 
A.4. Proof of Theorem \3.3\ 

Proof. Let us recall that rank(A4') — rank(A) — K. Then AA' has N — K eigenvalues 
equal to zero. We have that F,, (0) =AA' + a^I is symmetric and positive definite. From 
the relationship 

AA'-XI = F, (0) - (A + a^)I 

it follows that Fv(0) has minimum eigenvalues equal to with multiplicity N — K. 
Let z be a corresponding eigenvector Then, 

^^Yy{<d)z^^AA'z + a^, 

and \\z!A\\ = follows. On the other hand, if z'A = then Ty{0)z — AA'z+ G^z and z is 
eigenvector associated to the smallest eigenvalue O^. □ 

A.5. Proof of Theorem 13.41 

Proof. Since 

ry(o) = t 

u=l 

and for any v such that || v|p = 1 we have 

v/f,(o)v=x;iivV»ii'+(^'>(t' 

H=l 

it follows that the smallest eigenvalue of Fv(0) is at least a^. If xj/'uZ = 0,u~ l,2,...,s 
then 

Ty{0)z^'£Hf,iwlz) + cjh = a\ 



R.Baragona and F. BattagUa/OutUers in dynamic factor models 



426 



and z is eigenvector associated with a^, the smallest eigenvalue of rv(0). 

On the other hand, let z denote an eigenvector associated with the smallest eigenvalue 

CT^, it follows that z!Ty{Q)z = O^. Then 



Hence || \l>'i,z\\^ — 0, therefore y/^'z — for any u. 



□ 



A.6. Proof of Theorem \4.1\ 

Proof. If r, [h) is symmetric for any h we can write 



1 - - ^.-.-iXk^^ 
2n 



rv(o)+2^r,.(/!)cos(A/i) 



/! = -» 



which is real for any A. On the other side, if F{X) is real then F{X) — F{X) — F{~X) 
and 

Ty{h)' = Ty{-h) = J F{X)e-'^''dX = J F{(o)e''^^d(0 = Ty{h). 

□ 



A.7. Proof of Theorem \4.2\ 

Proof. Assuming normality, {X^,Xj) and {X^,Xl) for I < j < k < J are independent, 
therefore the unrestricted likelihood may be written 



-NJ 



ReF -ImF 
\mF ReF 



-J/2 ( J 

exp|-E((^;)',(^j)') 



ReF -ImF 
ImF ReF 



-1 



Xf 
4 



(A 



Therefore the maximum likelihood estimate of 
and the unrestricted maximum likelihood is 



ReF -ImF 
\mF ReF 



equals 



Sr —Si 

Sj Sr 



L = 7t 



-NJ 



Sr —Si 
Si Sr 



-J/2 



exp{-2NJ} . 



Under Hq, F = ReF, both Xf and are independendy A^(0, ^F), thus 

L{xf,xi,...,xf,xj)^n-'"\ReF\-'exp[-Jt[[{ReF)-'SR}} 
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On maximizing with respect to Re F, we find that the maximum likelihood estimate of 
ReF is Sr and the maximum equals 



Lo = n-'^-' \SrI-' exp{-2A?7} . 



The UkeUhood ratio is 



Using 



Sr —Si 
Si Sr 



7/2 



Sr —Si 
Si Sr 

it follows that the likelihood ratio is a monotonia function of the statistic 



U 



\Sr\ \SR + SiSjf^Si\ 

tl 

{SR'Slf\ 



\Sr\ \Sr + SiSj^ Si\ — 

and the rejection region is f/ < c. The distribution of U is analyzed in Chapter 8 of 
and corresponds in his notation to f/^v nj-n- i ■ CH 



A.8. Proof of Theorem\5j\ 

Proof. The first part of the theorem follows directly from proposition (4) in (fl2l) . p. 46. 

^ 1/2 

So we only have to prove that YX' = WxAfl . Let the singular value decomposition of 
Y be written as in proposition (1) in p. 60, that is 

Y = WDV'. 



The N X T matrix D may be written 



D 



1/2 



0, 



{r-K)xK 



^KxiT-r) 
0{,—K)x{T-r) 
^{N-r)xK ^{N-r)x{r-K) ^{N-r)x(T-r) 



A 



1/2 



r-K 



where A^_^ = diag(-\/A/f+i, . . . , and the A,'s are the singular values of Y'Y. The 
columns of the N x N orthogonal matrix W are the eigenvectors of YY' (arranged ac- 
cording to the non increasing order of the associated eigenvalues) and the columns of 
the T xT orthogonal matrix V are the eigenvectors of Y'Y. Note that some or all of the 
submatrices disappear when r = K, and/or r = N, and/or r = T. 

Let the matrix W be partitioned in such a way that it is conformable to the partitioned 
matrix D, that is 



W = 



W{1 -.K.l-.K) 
W{K+\: rA:K) 
W{r+ \ ■.N,l:K) 



W{l:K,K+l:r) W{1 : K,r+l : N) 
W{K+l:r,K+l:r) W {K + 1 : r,r+ 1 : N) 
W{r+l:N,K+l:r) W{r+l : N,r+l : N) 
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where VK(m ■.v,k:h)\s used to denote the submatrix of W which includes all entries wij 
with u<i<v and k < j < h. Multiplication by D yields 



WD 



W{1:K,1:K)a]/^ 



W{K+ 1 : r, 1 : K)aI/^ W{K+l ■.r,K+l: r)A'/^^ 
W{r+ l:N,l: K)a]/^ W{r+ 1 : N,K+l : r)AlL\ 







Kx{T-r) 



{r-K)x(T-r) 
(N-r)x(T-r) 



On the other hand, we have obviously X' [mi , . . . , uk] and multiplication of V' by X' 
yields 



y'X'= [Ml,...,Mr]'[Ml, 



0, 



Ik 

{T-K)xK 



Ik 

^{r-K)xK 
^{T-r)xK 



Some or all of the zero submatrices disappear when r — K and/or r —T. Summing up, 
we obtain 

YX' = WDV'X' = 



W{l:K,l:K)A'^'' W (l : K,K + I : r)Al.'_j. 
W{K+l:r,l: K)A^^'^ W {K +1 : r,K + I : r)Al'\ 



W{r+\:N,l:K)A^^^ W{r +1 : N ,K +\ : r)A'J_^f, 



1/2 



^Kx{T-r) 
^(r-K)x(T-r) 



(N-r)x(T-r) 



W{K+l: r,l:K)A]/^ 
W{r+l:N,l:K)Al/^ 
It is easily recognized that 



W(l -.K^l-.K) 
W{K+l:r,l:K) 
W{r+l ■.N,l:K) 



1/2 



so that the desired result 



follows. 



W{1 :K,1:K) 
W{K+1 :r,l:K) 
W{r+l ■.N,l:K) 



A = WkA^I^ 



Ik 

'^{r-K)xK 







{T-r)xK 



□ 



A.9. Proof of Theorem 16.71 

Proof. The result for the mean is obvious. For the variance we have, for example, 
var{r(7,,(0) - 7,,(0))} - E{(z^o + a),(z,,o -~Zs)}^ 

Then, by recalling that 

IE(7«(0))-7r.v(0)+O(r-i), 

(e.g.[l7l vol. 2, p. 693, Formula 9.5.4) the result follows. The proof for var{r(7„ (/i) — 
7„(/z))} is similar □ 
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A.IO. Proof of Theorem \6.2\ 

Proof. We have 

Re{/,,(A;) -/„.(A;)} = ^ +A„(A;) 

and 

Im{/„(A,)-7,,(A,)}=B„(1,), 

and obviously EA„(A) — EZ?„ (A) — for each r, s, and X. 
We evaluate now the second moments 

Y T T 

var\/rA„(A;) = , s^2t ^'^^ ^ cosA^(fo - f )cosA;(fo - m) 

X (a)r(z.vr - Z) + (Zrt - Z)«.v) («r(Zs« - z) + (z™ - z)fi)s)} 

" 7r;;W '^"^^^(^o - cos A;(fo - m) 

X {(O^Jssit - m) + «,^7rr(r - «) + «,-«v(7„ (f - m) + - f ))} 

1 

= o^Vt {a)^V,,(/!) + «27,,(/i) + a),a),(7„(/i) + 7„(-/!))}^cosA,vcosA,(v+/!), 

where we have put v = tQ — t and are neglecting end effects. If Xj is a Fourier frequency, 
^cosAjVCosA;(v + /!) — ^cos A;v(cos Ayvcos Ay/z — sinA;VsinA;7i) 

V V 

= COS Xjh ^ cos^ XjV — sin Xjh ^ cos Ay v sin A; v 

V v 

T 

= — COS A,7i, 

and finally 

1 r 

Var\/rA,,(A,) = ^ X! {«r 7s.(/!) + «.^7rr(/!) + «r«s(7n(/!) + 7«(-/!))}7COsA;/l 

^ ^{/"(^.O^' + + a),«,(/,,(A,-) +/,,(-A,))}. 

On repeating the same argument for Z?„ (Ay ) we obtain 

Y T T 

yaiVfBrsiXj) = — — ^sinAj(ro-f)sinAj(ro-M) 

X {CO^Yssit -u) + C0jYrr{t ~ u) - (Or(Os{yrs{t - «) + 7«(m - 0)}: 



R.Baragona and F. BattagUa/OutUers in dynamic factor models 



430 



and since 

^sin/ljVsinAj(v + /i) = ^sinAjv(sinAyVCOs Aj/i + cosAyVsin Ay/i) 

V V 

= COS Xjh ^ sin^ XjV + sin Xjh ^ sin Ay v cos XjV 

V V 

T 



— COS A,/!, 

2 



it follows 



VarVrB„(A,) ^ ^{U{Xj)(i>J + frr{Xj)(0^ - COrCO^iMXj) +M-Xj))}. 

Noting that 

VfRs{IrsiXj{T))-I,.,{Xj{T))} = VfA,-siXjiT)) 

and 

Vflm{UXjiT))-UXj{T))} = VfBrs{Xj{T)) 
the result follows. □ 

A.ll. Proof of Theorem 16.31 

Proof. Since 

F,,(A)-F,,(A) = ^ ^r(X-Xj){A{Xj)+iB{Xj)} + ^, 

j=-T/2 

we have 

varRe{F„(A) -F„(A)} - wr(A - A,)vvr (A - A,)cov{A,,(A,)A,,(A,)} 

and 

varIm{F,-,(A) - F,,(A ) } = (— )2 ^ ^ (A - A,) (A - Ai)cov{B,, (A, (A^.) } , 

j k 

and it may be easily seen that cov{A„(Aj)A„(At^)} and cov{B„(A/)B,-i(Ajt)} tend to 
zero as r ^ oo if Xj ^ X^ while the results for Xj — Xk are given in the previous the- 
orem. As r ^ oo, the window bandwidth tends to zero, (2;r/r)Iwr(A-A;)^ tends 
to cqM and therefore var{Re(iVi(A) — F„(A))} has the same asymptotic behavior as 
^^var(VrA„(A)) while var{Im(Frs(A) — F„(A))} has the same asymptotic behav- 
ior as 2p^var(\/rB,i(A)). On substituting the expressions for the variance the thesis 
follows. □ 
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